3.3 Formatting dates and making sure loaded data has only Manhattan Borough entries
Here, we created a robust set of parse orders for mixed formats in both our datasets. We created cleaned date columns while keep original columns untouched. Also, we added new columns with month-year (Month name + Year) extracted which will be used for alluvial stacks later on
The date columns have been parsed properly and as we can see the both raw data we downloaded, only had data from the Manhattan borough since before and after filtering the number of rows remain same for both datastes i.e. 589005 rows in Housing Violations dataset and 425248 rows in 311 requests dataset.
3.4 Housing violation decription column data cleaning
[1] "(A) § HMC:FILE ANNUAL BEDBUG REPORT IN ACCORDANCE WITH HPD RULE AS DESCRIBED ON THE BACK OF THIS NOTICE OF VIOLATION OR AS DESCRIBED ON HPDâ\u0080\u0099S WEBSITE, WWW.NYC.GOV\\HPD, SEARCH BED BUGS."
[2] "§ 27-2005 ADM CODE PAINT METAL IN ACCORDANCE WITH DEPT. REGULATION RADIATOR AT THE SOUTH WALL IN THE KITCHEN LOCATED AT APT 9C, 9th STORY, 3rd APARTMENT FROM EAST AT SOUTH"
[3] "§ 27-2026 ADM CODE REPAIR THE LEAKY AND/OR DEFECTIVE FAUCETS AT SINK IN THE KITCHENETTE LOCATED AT B-ROOM 3N, 3rd STORY, 1st B-ROOM FROM NORTH AT EAST"
[4] "§ 27-2005, 27-2007, 27-2041.1 HMC: REPLACE OR REPAIR THE SELF-CLOSING DOORS THAT IS MISSING OR DEFECTIVE AT BULKHEAD"
[5] "§ 27-2026 ADM CODE REPAIR THE LEAKY AND/OR DEFECTIVE FAUCETS AT WASHBASIN IN THE BATHROOM LOCATED AT APT 20B, 2nd STORY, 1st APARTMENT FROM NORTH AT EAST"
[6] "§ 27-2046.1 HMC: REPAIR OR REPLACE THE CARBON MONOXIDE DETECTING DEVICE(S). MISSING IN THE ENTIRE APARTMENT LOCATED AT APT 7, 3rd STORY, 1st APARTMENT FROM EAST AT SOUTH"
[7] "§ 27-2045 ADM CODE REPAIR OR REPLACE THE SMOKE DETECTOR MISSING LOCATED AT APT 2A, 2nd STORY, 1st APARTMENT FROM WEST AT NORTH"
[8] "§ 27-2046.1 HMC: REPAIR OR REPLACE THE CARBON MONOXIDE DETECTING DEVICE(S). MISSING IN THE ENTIRE APARTMENT LOCATED AT APT 5C, 5th STORY, 1st APARTMENT FROM WEST AT NORTH"
[9] "§ 27-2045 ADM CODE REPAIR OR REPLACE THE SMOKE DETECTOR MISSING IN THE ENTIRE APARTMENT LOCATED AT APT 5C, 5th STORY, 1st APARTMENT FROM WEST AT NORTH"
[10] "§ 27-2005 ADM CODE REPAIR THE BROKEN OR DEFECTIVE PLASTERED SURFACES AND PAINT IN A UNIFORM COLOR EAST WALL APPROX 4SQ FT AND NORTH WALL APPROX 4 SQ FT IN THE BATHROOM LOCATED AT APT 1A, 1st STORY, 2nd APARTMENT FROM SOUTH AT WEST"
ViolationCategory n
1 PAINT/PLASTER 127336
2 PLUMBING 93154
3 DOOR/WINDOW/LOCK 91860
4 OTHER 53533
5 PEST/SANITATION 50753
6 SMOKE/CO DETECTOR 40227
7 HEAT/HOT WATER 39990
8 REGISTRATION/ADMIN 35133
9 WATER LEAK 23370
10 FLOOR/CEILING 19633
11 FIRE SAFETY 7121
12 GAS/APPLIANCES 4004
13 ELECTRICAL 1607
14 BUILDING SYSTEMS 679
15 BUILDING MANAGEMENT 376
16 MOLD 118
17 ELEVATOR 81
18 VENTILATION 30
Code
# ========== 311: CATEGORY MAPPING (parallel to housing) ==========# Compare with 311 Complaint Typessr311 |>count(Complaint.Type, sort =TRUE) |>print()
Complaint.Type n
1 HEAT/HOT WATER 229222
2 PLUMBING 50637
3 PAINT/PLASTER 44624
4 WATER LEAK 30190
5 General Construction/Plumbing 25207
6 Maintenance or Facility 20202
7 Elevator 16588
8 Plumbing 2197
9 ELEVATOR 1302
10 Non-Residential Heat 1094
11 School Maintenance 1064
12 OUTSIDE BUILDING 857
13 Water Leak 556
14 Heat/Hot Water 487
15 Paint/Plaster 404
16 Mold 402
17 Building Drinking Water Tank 83
18 Sewer Maintenance 68
19 Water Maintenance 31
20 Building Condition 19
21 Unstable Building 14
Category n
1 HEAT/HOT WATER 230803
2 PLUMBING 78041
3 PAINT/PLASTER 45028
4 WATER LEAK 30746
5 ELEVATOR 17890
6 OTHER 17784
7 PEST/SANITATION 4554
8 MOLD 402
Here, manual categorization logic we applied successfully maps unstructured text descriptions (NOVDescriptions) in Housing Violations dataset into unified high-level categories which enables a direct one-to-one comparison between both. We also normalized the categories in 311 Complaints dataset since there were multiple for the same category, ege. there were 2 types “HEAT/HOT WATER” and “Heat/Hot Water” which were merged into 1.
There is a substantial disparity in volume for the “HEAT/HOT WATER” category in violations vs complaints, where the number of 311 Service Requests is significantly higher than the count of official Housing violations issued for the same. Hence, we can conclude that while tenant complaints regarding heating are very very frequent, quite a smaller proportion result in confirmed enforcement actions by management. In other categories, the difference is relatively small. In case of Paint/Plaster and Pests, there is much higher number of building violation entries which shows that probably regular building inspections is conducted for those and hence less direct 311 complaints. Out of all top 7 identified categories overlapping in both the datasets, Heat/Hot Water is the highest and Mold is the lowest complaint type.
Across quarters, HEAT/HOT WATER 311 complaints and housing violations both show a clear seasonal pattern. They spike in Q1, drop sharply in Q2 and Q3 for all 3 years, and rise again Q4 matching winter heating needs. For PLUMBING, PAINT/PLASTER, PEST/SANITATION, WATER LEAK, ELEVATOR, and MOLD, housing violations counts change over the quarters (most show a local peak around Q3), but patterns are less sharp and not perfectly identical so they don’t show a single clear seasonal shape like HEAT/HOT WATER does. Pest and sanitation 311 complaints peak in Q3 in all 3 years, which suggests these problems are most common in late summer, which is also consistent with warmer weather since heat and humidity make it easier for rodents, roaches, and insects to spread, and garbage issues can get worse when it’s so hot. But it’s not as strong or as perfectly consistent across all years. For the other categories for 311 complaints, the quarter‑to‑quarter changes within each panel are relatively small and do not follow a clear, repeated seasonal shape across years.
3.8 Housing Violation Monthly Transition Alluvial for 2024 (for a small sample size)
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
Warning in to_lodes_form(data = data, axes = axis_ind, discern =
params$discern): Some strata appear at multiple axes.
We can see how 2024 housing violations (we have taken a small sample slice of 200 here to visualize better) move from inspection month to NOV (notivr of violation) month to when current status takes effect, with ribbon colors showing violation Class (A least serious, C most serious, I immediately hazardous). Mostly, flows stay within nearby months across 3 stages showing many violations progress through the process within the same part of the year. Thicker orange and blue ribbons show that Class B and C violations make up most of the pipeline, while Class A is thinner and Class I appears as a smaller set of immediately hazardous cases. There are also some blank bands in NOV step which occur when inspection and status dates are recorded but NOVIssuedDate or Class code is missing, so those months or classes appear as NA.
# Check if they're all inspected in Feb or entered in Febsummary(feb_2022$InspectionDate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
"2022-02-01" "2022-02-15" "2022-02-15" "2022-02-15" "2022-02-16" "2022-02-28"
Code
summary(feb_2022$ApprovedDate)
Length Class Mode
28414 character character
Median/Mean inspection date is February 15, 2022 and Most inspections are clustered around Feb 15-16. This is NOT a data error but this appears to be a mass inspection event or systematic sweep by HPD in mid-February 2022.
4.1 Smoothed Temporal View Research Question: What are the overall trends when we smooth out monthly fluctuations?
Code
# Aggregate by quarter for smoother trendhousing_2[, Quarter :=quarter(InspectionDate)]housing_2[, YearQuarter :=paste0(Year, "-Q", Quarter)]viol_by_quarter <- housing_2[, .N, by = .(Year, Quarter, YearQuarter)][order(Year, Quarter)]# Create quarter labelsviol_by_quarter[, QuarterLabel :=paste0(Year, "\nQ", Quarter)]# Plotggplot(viol_by_quarter, aes(x =factor(YearQuarter, levels = YearQuarter), y = N, group =1)) +geom_line(color ="steelblue", linewidth =1.2) +geom_point(color ="steelblue", size =3) +scale_y_continuous(labels = scales::comma) +labs(title ="Housing Violations by Quarter",subtitle ="Manhattan, 2022-2025 (smoothed trend)",x ="Quarter",y ="Number of Violations" ) +theme_minimal(base_size =16) +theme(plot.title =element_text(face ="bold"),axis.text.x =element_text(angle =45, hjust =1) )
4.2 What Types of Violations Are Most Common? Research Question: What are the most common housing violations that tenants and inspectors encounter?
4.3 Violation Severity and Status Research Question: What types of violations are most common, and how seriously are they being addressed?
Code
# === GRAPH 3A: Violation Class Distribution ===# Class definitions: A=Non-Hazardous, B=Hazardous, C=Immediately Hazardous, I=Failure to Registerclass_data <- housing_2[, .N, by=Class][order(-N)]# Add descriptive labelsclass_data[, ClassLabel :=fcase( Class =="A", "Class A: Non-Hazardous", Class =="B", "Class B: Hazardous", Class =="C", "Class C: Immediately Hazardous", Class =="I", "Class I: Failure to Register",default = Class)]ggplot(class_data, aes(x =reorder(ClassLabel, N), y = N, fill = Class)) +geom_bar(stat ="identity") +geom_text(aes(label = scales::comma(N)), hjust =-0.1, size =5) +coord_flip() +scale_y_continuous(labels = scales::comma, expand =expansion(mult =c(0, 0.15))) +labs(title ="Housing Violations by Severity Class",subtitle ="Manhattan, 2022-2025",x ="Violation Class",y ="Number of Violations",caption ="Class A: Non-hazardous | Class B: Hazardous | Class C: Immediately Hazardous" ) +theme_minimal(base_size =16) +theme(plot.title =element_text(face ="bold"),legend.position ="none" )